Show the code
import pandas as pd
import numpy as np
from lets_plot import *
LetsPlot.setup_html(isolated_frame=True)Course DS 250
[MIGUEL SMITH]
For Project 1 the answer to each question should include a chart and a written response. The years labels on your charts should not include a comma. At least two of your charts must include reference marks.
How does your name at your birth year compare to its use historically?
Looks like my name was increasing in popularity until its peak in 2006 and 2000 rode that wave. Since 2006 it has decreased in popularity.
# Include and execute your code here
miguel_data = df.query('name == "Miguel"')
(ggplot(data=miguel_data, mapping=aes(x='year', y='Total')) +
geom_line(color='blue') +
geom_point(color='blue')+
geom_vline(xintercept=2000,color='red',linetype='dashed')+
geom_text(x=1980,y=3917,label = "Year I Was Born -->") +
labs(
title='Popularity of the Name Miguel Over Time',
x='Year',
y='Babies Named Miguel'
)+
scale_x_continuous(format="d")
).show()
### Looks like my name was increasing in popularity until its peak in 2006 and 2000 rode that wave. Since 2006 it has decreased in popularity. If you talked to someone named Brittany on the phone, what is your guess of his or her age? What ages would you not guess?
If I was talking to a person named Brittany on the phone I would guess their age to be around 33-37. I would not guess their age to be greater or less than that.
# Include and execute your code here
brittany_data = df.query('name == "Brittany"')
(ggplot(data=brittany_data, mapping=aes(x='year', y='Total')) +
geom_line(color='blue') +
geom_point(color='blue')+
labs(
title='Popularity of the name Brittany Over Time',
x='Year',
y='Babies Named Brittany'
)+
scale_x_continuous(format="d")
).show()
### If I was talking to a person named Brittany on the phone I would guess their age to be around 33-37. I would not guess their age to be greater or less than that. Mary, Martha, Peter, and Paul are all Christian names. From 1920 - 2000, compare the name usage of each of the four names in a single chart. What trends do you notice?
Mary was an extremely popular name before 2000, more than the others. This makes sense knowing christianity was the predominate religion in the USA, perhaps we see this popularity due to Catholicism and their adoration for Mary. An observation presented in class was the Ku Klux Klan reemergence in the 1920s, which resulted in anti-immagration, anti-Catholocism and anti-Semitism. That could have caused the decline we see from ~1920-1936.
# Include and execute your code here
christian_names = df.query('name in ["Mary","Martha","Peter", "Paul"] ')
(ggplot(data=christian_names, mapping=aes(x='year', y='Total',color = 'name')) +
geom_line(color='grey') +
geom_point(size=3) +
labs(
title='Popularity of Biblical Names Over Time',
x='Year',
y='Babies Named',
color = 'Name'
)+
scale_x_continuous(format="d")+
xlim(1920,2000)
).show()
### Mary was an extremely popular name before 2000, more than the others. This makes sense knowing christianity was the predominate religion in the world, perhaps we see this popularity due to Catholicism and their adoration for Mary. Think of a unique name from a famous movie. Plot the usage of that name and see how changes line up with the movie release. Does it look like the movie had an effect on usage?
Noting when the first Star Wars movies were released we can make some inferences on how the popularity of the name Luke came to be. We can see slight jumps after the release of A New Hope in 1977 but nothing super significant. We can note the same after Return of the Jedi in 1983. What could be an explanation of the spike in popularity from 1995 and on is that the young people who grew up watching Star Wars could have named their kid(s) Luke when they started having children a few years after the first release.
luke_data = df.query('name =="Luke"')
(ggplot(data=luke_data,
mapping=aes(x='year', y='Total')) +
geom_line() +
geom_point(size=3) +
scale_x_continuous(format="d") +
geom_vline(xintercept=1977,color='red',linetype='dashed')+
geom_text(x=1957, y = 6000, label = 'A New Hope Released')+
geom_vline(xintercept=1983,color='green',linetype='dashed')+
geom_text(x=2000, y = 900, label = 'Return of Jedi Released')+
labs(
title='A New Hope(ful name)',
x='Year',
y='Number of Babies Named Luke',
color='Name'
)
).show()
### For Luke, it does appear that the movie influenced the popularity of the name. It appears that the name might have become that much popular when the generation who grew up with Star Wars had children instead of immediately after the first movie was released. I tried to see if the names from Disney's Frozen had any popularity increase after 2013 but they only increased slightly.Reproduce the chart Elliot using the data from the names_year.csv file.
type your results and analysis here
# Include and execute your code here
Elliot_data = df.query('name =="Elliot"')
(ggplot(data= Elliot_data,
mapping = aes(x='year',y='Total',color='name'))+
geom_line()+ # Remove the fixed color here to let the aesthetic work
geom_vline(xintercept = 1982,color='red',linetype = 'dashed')+
geom_vline(xintercept = 2002,color='red',linetype = 'dashed')+
geom_vline(xintercept = 1985,color='red',linetype = 'dashed')+
geom_text(x=1972,y=1200,label = "First Release",color='black')+
geom_text(x=1995,y=1200,label = "Second Release",color='black')+
geom_text(x=2012,y=1200,label = "Third Release",color='black')+
scale_x_continuous(format='d')+
xlim(1950,2025)+
labs(
title = 'Elliot... What?',
x='Year',
y='Total',
color= 'Name'
) +
theme(legend_position = 'right')+
scale_color_manual(values=['#800080'])
)